Pathway hunting by random survival forests
نویسندگان
چکیده
MOTIVATION Pathway or gene set analysis has been widely applied to genomic data. Many current pathway testing methods use univariate test statistics calculated from individual genomic markers, which ignores the correlations and interactions between candidate markers. Random forests-based pathway analysis is a promising approach for incorporating complex correlation and interaction patterns, but one limitation of previous approaches is that pathways have been considered separately, thus pathway cross-talk information was not considered. RESULTS In this article, we develop a new pathway hunting algorithm for survival outcomes using random survival forests, which prioritize important pathways by accounting for gene correlation and genomic interactions. We show that the proposed method performs favourably compared with five popular pathway testing methods using both synthetic and real data. We find that the proposed methodology provides an efficient and powerful pathway modelling framework for high-dimensional genomic data. AVAILABILITY The R code for the analysis used in this article is available upon request.
منابع مشابه
Pathway analysis using random forests with bivariate node-split for survival outcomes
MOTIVATION There is great interest in pathway-based methods for genomics data analysis in the research community. Although machine learning methods, such as random forests, have been developed to correlate survival outcomes with a set of genes, no study has assessed the abilities of these methods in incorporating pathway information for analyzing microarray data. In general, genes that are iden...
متن کاملHigh-Dimensional Variable Selection for Survival Data
The minimal depth of a maximal subtree is a dimensionless order statistic measuring the predictiveness of a variable in a survival tree. We derive the distribution of the minimal depth and use it for high-dimensional variable selection using random survival forests. In big p and small n problems (where p is the dimension and n is the sample size), the distribution of the minimal depth reveals a...
متن کاملComparison of Survival Forests in Analyzing First Birth Interval
Background and objectives: Application of statistical machine learning methods such as ensemble based approaches in survival analysis has been received considerable interest over the past decades in time-to-event data sets. One of these practical methods is survival forests which have been developed in a variety of contexts due to their high precision, non-parametric and non-linear nature. This...
متن کاملExtent and ecological consequences of hunting in Central African rainforests in the twenty-first century
Humans have hunted wildlife in Central Africa for millennia. Today, however, many species are being rapidly extirpated and sanctuaries for wildlife are dwindling. Almost all Central Africa's forests are now accessible to hunters. Drastic declines of large mammals have been caused in the past 20 years by the commercial trade for meat or ivory. We review a growing body of empirical data which sho...
متن کاملComparison of Random Survival Forests for Competing Risks and Regression Models in Determining Mortality Risk Factors in Breast Cancer Patients in Mahdieh Center, Hamedan, Iran
Introduction: Breast cancer is one of the most common cancers among women worldwide. Patients with cancer may die due to disease progression or other types of events. These different event types are called competing risks. This study aimed to determine the factors affecting the survival of patients with breast cancer using three different approaches: cause-specific hazards regression, subdistri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 29 1 شماره
صفحات -
تاریخ انتشار 2013